Entry Name: UKN-Hundt-MC2

VAST Challenge 2014
Mini-Challenge 2

 

Team Members:
Michael Hundt, University of Konstanz, michael.hundt@uni-konstanz.de        PRIMARY

Manuel Wildner, University of Konstanz, manuel.wildner@uni-konstanz.de

Natascha Siirak, University of Konstanz, natascha.siirak@uni-konstanz.de

 

Student Team:  YES

 

Analytic Tools Used:

KNIME, developed by the University of Konstanz (https://www.knime.org/)

QGis, a cross-platform free and open source desktop geographic information system application (http://www.qgis.org/de/site/)

JMP, a statistics computer program developed by the JMP business unit of SAS

DayOfThePOK, our own interactive visualization and exploration tool, privately developed by N.Siirak, M.Hundt and M.Wildner in Konstanz, Germany

 

Approximately how many hours were spent working on this submission in total?        450

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2014 is complete?       YES

 

Video:

http://www.youtube.com/watch?v=THLB7JKYThM

 

UKON-MC2

 

 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Questions

 

MC2.1 – Describe common daily routines for GAStech employees. What does a day in the life of a typical GAStech employee look like?  Please limit your response to no more than five images and 300 words.

 

The typical employee John works five days a week. From morning to evening his behavior gets more and more unpredictable. His working day starts with a daily coffee. Meeting with some colleagues, he regularly enters his favorite coffee-bar at about 7:30.

At 8:30 the first working session at GAStech starts, until lunch break at about 13:00 o‘clock.  Being utterly hungry John drives to a restaurant of his daily choice. Once in a while he combines his lunch-break with some office-shopping. At 14:00 o‘ clock the break `is` over. A little bit late, but satisfied, he gets back to work. At about 17:30 he looks on his watch and thinks: „Oh, `beer o‘clock`, but I still want to finish this.“ There is no definite regular time he stops working. The evening planning is different.

Sometimes John drives home first before dinner or he continues his activities and directly goes out.

And sometimes he jumps in at a grocery-store, a fast-food or other dining-restaurant or  goes shopping (clothing, car supply, supermarket etc. ) on his way home, possibly followed by other activities.  


Overview over each car (X) and all days (Y):

As one can see, most cars drive every day. Distinct exceptions being cars no. 101-107 (Trucks) and no. 31 (Sanjorge Jr., the chef that arrives in the 2nd week).

Movement-Overview

:

This picture shows distinct patterns regarding the times cars are used. On work-days there is movement in the morning, at noon and in the evening.

On weekend there is nearly no movement in the morning, as people are not going to work/taking their morning-coffee.

Date-Time-Plot

Breakfast




Lunch




Beer o'clock

 

 

 

MC2.2 – Identify up to twelve unusual events or patterns that you see in the data. If you identify more than twelve patterns during your analysis, focus your answer on the patterns you consider to be most important for further investigation to help find the missing staff members. For each pattern or event you identify, describe

a.       What is the pattern or event you observe?

b.      Who is involved?

c.       What locations are involved?

d.      When does the pattern or event take place?

e.      Why is this pattern or event significant?

f.        What is your level of confidence about this pattern or event?  Why?

 

Please limit your answer to no more than twelve images and 1500 words.


Definitions:


We defined three levels for each, the gps- and the cards-data:

CCLC: credit card and loyalty card agree in time, name, location and price; both cards were used for the same purchase.

CC: all other credit card expenses, which have data and time

LC: all other loyalty card expenses, which only have a date

Stop-Locations: cars 1-35 are assigned to one person. From the gps-tracker we created stops for each of them. These data are quite sure.

Truck-Locations: truck drivers share a pool of trucks (ids 101, 104-107). From the gps-tracker we created stops, but a clear matching to one person is not possible.

No GPS: there are persons or events, where we do not have any gps-data for, e.g. only credit cards 'stops'.


We defined our confidence levels by a two-way mapping (e.g. [2,2]), one regarding the gps- the other regarding the cards-data. With this notation we can see the differentiation, but we can also add those two numbers together to produce a confidence-score, [2,3] = score 5.

Stop-Locations

--> 2 , 3 <--

CCLC

Truck-Locations

--> 1 , 2 <--

CC

No GPS

--> 0 , 1 <--

LC

--

--> 0 , 0 <--

--


1)

suspicious stops

a) Appearance of four carIDs at unknown locations between 11 and 13 o'clock. Often alone, sometimes in groups.

b) Inga Ferro, Loreto Bodrogi, Hennie Osvaldo, Minke Mies

c) Unknown, after that: meetings at some higher class cafés (green star)

d) Almost every day

e) These security members are suspicious. Now there are highly strange meetings in the middle of nowhere, places we did not know about before.

f) There are overlapping stops. If not somebody else drove the car or somebody traveled with them, we can be sure, that at least these four employees meet regularly. LEVEL: [2,0] = score 2.


2)

truck routes

a) Four trucks drive to the airport and to factories, but truck 105 stops also at Katerina's Cafe

b) One or multiple truck drivers and persons they meet at the Cafe

c) Katerina's Cafe as an outlier

d) On Tuesdays, Wednesdays and Thursdays between 11:30 and 13:30

e) Trucks are for business use only. This seems to be a personal use.

f) We have the gps of truck 105 and some purchases by Valeria Morlun. LEVEL: [1,2] = score 3.


3)

not at home

a) Hennie Osvaldo (ID 21) does regularly sleep out, a 'one-way-visit'-pattern.

b) Hennie Osvaldo, Lidelse Dedos or Birgitta Frente

c) Their homes

d) We observed a time interval between 17 o'clock in the evening until 5 o'clock in the morning to see where people stayed over night

e) As we could see in 1) Hennie Osvaldo belongs to a group that roused our interest. This pattern gives a hint to whom Hennie cherishes close affiliation, in this case a loving relationship to one of the two women can be assumed.

f) Astounding is the regularity. This indicates a weekly plan. The Home-Locations are taken from the Stop-Locations, LEVEL: [2,0] = score 2.


4) Late-night meetings:

late night

a) Meetings in the middle of the night at different dates at different home locations. Three times one "guest" arrives when the other "guest" leaves. Maybe an observation.

b) Ingrid Barranco (4), Ada Campo-Corrente (10), Loreto Bodrogi (15), Isia Vann (16), Hennie Osvaldo (21), Minke Mies (24), Orhan Strum (32), Willem Vasco-Pais (35)

c) At four different home locations

d) At late night and early mornings, on the 7th, 9th, 11th and the 14th of january.

e) This pattern is significant because normal people sleep at night. Further, we are again stumbling over names which had already suspicious behavior.

f) Stop-Locations, LEVEL: [2,0] = score 2.


5)

engineer-party

a) There is a big gathering of engineers at Lars Azada's home.

b) Seventeen people, almost all engineers or technicians are present.

c) At Lars Azada's home

d) It is the 10th of january and the feast started at about 19:00 o'clock.

e) It is not so sure what kind of meeting it is, but this big gathering of engineers and technicians at a private home is worth to be mentioned. Only one engineer is missing: Kare Orilla (27), who drove home after work and was not picked up by a tracked car.

f) Stop-Locations, LEVEL: [2,0] = score 2.


6)

engineer-party

a) Late evening Bertrand Ovan (29) is driving a big circle and stops at some cafés.

b) Bertrand Ovan, maybe also Anda Ribera, Linnea Bergen, Stenig Fusil, Elsa Orilla

c) The streets, Guy's Gyros, Ouzeri Elian, Kalami Kafenion, Hippokampos, U-Pump

d) On the 11th of january during 22 and 24 o'clock.

e) This behavior is extremely standing out. Further the 11th seems to be date of interest.

f) Gps is quite sure. Bertrand Ovan, LEVEL: [2,0] = score 2. There is no other car movement or stop. Further during this time range nobody bought something with his credit card, but some loyalty card expenses of this day are open. All other names mentioned are therefore possible, but very uncertain, LEVEL: [0,1] = score 1. Anda Ribera does not have a company car.


7)

kronos capitol

a) Gathering at the Kronos Capitol is followed by some more meetings

b) Loreto Bodrogi (15), Kanon Herroro (22), Adra Nubarron (25), Elsa Orilla (28), Edvard Vann (34)

c) Mainly: Kronos Capitol and later: Kalami Kafenion, Ahaggo Museum, Katerina's Cafe, Hippokampos

d) Saturday the 18th of january, starting at 10:10 at the Kronos Capitol

e) It is the Kronos Capitol, that is significant. Car 25 will stay there over night, so Adra Nubarron? Adra is not part of any other meeting that follows, so it seems, that Adra stayed over night, or something else happened at the Capitol. All these meetings afterwards are vastly considerable.

f) According to who they met afterwards, we have all different levels of confidence from [2,0] to [2,3].


8)

kronos capitol

a) During the two weeks Sten Sanjorge Jr. is not living on the island.

b) Willem Vasco-Pais .. see picture above.

c) Abila Zacharo, Desafio Golf Course, Chostus Hotel, Hallowed Grounds, Guy's Gyros, Katherina's Cafe

d) 17th to 19th of january.

e) It is the Jr. Chef, being present for only three days. That is already significant. Further a closer look yields a better insight concerning relationships and locations of interest.

f) The places where the Jr. stayed are very sure. We have the gps-data verified by credit card, LEVEL: [2,2] = score 4. Regarding with whom he met, we have all levels of certainty from [0,1] to [2,3].


9)

kronos capitol

a) Late truck movement to the airport without spending money (LEVEL: [1,0] = score 1), a CC-Stop (LEVEL: [0,2] = score 2) at "Carlyle Chemical Inc." from Valeria Morlun, and a regular meeting at "Katherina's Cafe" (LEVEL: from [1,1] to [2,2])

b) Valeria Morlun, who is a truck driver, and truck ids 104, 105 and 106. Present for the regular meeting are: Dante Coginian, Hennie Osvaldo, Ruscella Mies Haber, Bertrand Ovan, Emile Arpa, Carla Forluniau

c) Airport, Katherina's Cafe, Carlyle Chemical Inc.

d) the 9th and especially the 16th of january

e) Why is there no truck stop at "Carlyle Chemical Inc."? Whom did they pick up at the airport? The Jr. Chef?

f) LEVEL: from [1,0] to [2,2], see above.


10)

kronos capitol

a) Stop durations greater than 400 min.

b) Axel Calzas, truck 107, Adra Nubarron, Hennie Osvaldo, Sten S. Jr.

c) Hippokampos, Albert's Fine Clothing, Kronos Capitol, Chostus Hotel

d) The 6th and the 12th concerning Axel Calzas, the 18th concerning Adra Nubarron, the 11th concerning Hennie Osvaldo, 17th and 18th concerning the Jr. Chef,

e) There is strong deviation from typical behavior. A normal truck would park at GAStech. A normal car of an employee would stay at home over night.

f) Stop-Locations, at least LEVEL: [2,0] = score 2 up to verified stops, LEVEL: [2,3] = score 5


11)

kronos capitol

a) Elsa Orilla is, three times more than everybody else, paying in bar (sounds like being a german ), Lucas Alcazar has one extreme expense outlier of 10.000€, we see the top10 of our money spending people.

b)

Elsa Orilla (28,Drill Technician), Lucas Alcazar (1,IT Helpdesk)

CCLC: Varro Awelon (!), Irene Nant (Facilities), Felix Resumir (30,Security), Ada Campo-Corrente (10,Executive), Sven Flecha (17,IT-Technician), Edvard Vann (34,Security), Lars Azada (2,Engineer), Henk Mies (Facilities), Orhan Strum (32,Executive), Valeria Morlun (Facilities)

CC: Kanon Herrero (22,Security), Varja Lagos (23,Security), Hideki Cocinaro (12,Security), Isande Borrasca (7,Drill Technician), Isia Vann (16,Security), Cornelia Lais (!), Bertrand Ovan (29,Facilities.. only one with a car), Vira Frente (19,Hydraulic Technician), Carla Forluniau (!), Loreto Bodrogi (15,Security),

c) Frydos Autosupply n' More, the other high deviations are mainly on facility stuff: Abila Airport, Abila Screpyard, Maximum Iron and Steel, Kronos Pipe and Irrigation, Charlyle Chemical Inc., Nationwide Refinery, Stewart and Sons Fabrication

d) Over all fourteen days.

e) We get into focus some almost unknown people like Varro Awelon or Cornelia Lais. Further Elsa Orilla (see 2.3) is clearly standing out. Why is she paying mainly in cash? Also interesting to observe is, that so many security people are among these group of people. Truck drivers also have high spendings, but mostly job related and during working hours.

f) Expenses, LEVEL: [0,1] to [0,3]


12)

a) Lucas Alcazar drives alone to GAStech in the middle of the night.

b) Lucas Alcazar

c) GAStech, Home of Lucas Alcazar

d) 3 nights, each between 22:00 and 2:00

e) Usually a company is closed at night.

f) Stop-Locations, LEVEL: [2,0]

 

 

 

MC2.3 – Like most datasets, the data you were provided is imperfect, with possible issues such as missing data, conflicting data, data of varying resolutions, outliers, or other kinds of confusing data.  Considering MC2 data is primarily spatiotemporal, describe how you identified and addressed the uncertainties and conflicts inherent in this data to reach your conclusions in questions MC2.1 and MC2.2.  Please limit your response to no more than five images and 300 words.

 

 

List of data cleaning steps:

 

Type

Name

Identification

Transformation

Outlier

Daily Dealz

Count shoppings

Drop

Coffee Shack

Count shoppings, customers

Drop

Single GPS-Signals

Multiple car-stops at same location without movement

Merge successive stops

10000-purchase

Sort by price

Suspicious, keep it

Different Resolution

Timestamp CC-/LC-data

Join not possible, LC-time is missing

Join other values instead: Name, Location, Price, Date

Conflicting

Money spent

Compute difference between CC- and Loyalty-purchases (same location/date/customer)

Take values of CC, because payment is more secure

Apostrophe

Katerina`s Café causes format errors

Replace by Katerina's Cafe

Date: Kronos Mart

CC-/LC-card join fails

Assign LC-date to the next day

Missing

GPS-Car31

No GPS data visible before the 17th

No CC/Loyalty-data available too => no transformation possible

GPS-Car9 and 107

Stops suddenly and continues movement somewhere else

Too many hours missing for a reliable computation, keep discovery in mind

Exact Position of Locations

Only tourist-map given

Join stops with purchases using CarID and assign cluster centers to locations

ID: Truck Driver and People Without Car

Remaining values after joining CC-data&Car-assignment

Assign new IDs and generate short stops out of CC-data

Shifted

GPS-Car28

Look at longitude-latitude-visualization

Shift all values by the mean deviation between car-stop and shop position

12:00-shops

Try to find corresponding stops

Shift time of purchase to morning car-stop (except for Carlyle Chemical Inc, 13th, 12:00)

 

Time assignment of 12h-shops  

time assignment of 12h-shops

 

Join CC- and LC-data + get the exact position of the locations  

exact position of locations

 

Calculate stops for people without carID out of CC-data  

stops for people without carID

 

 Overview showing layers after data cleaning and transformation

kronos capitol